Current Issue : July-September Volume : 2023 Issue Number : 3 Articles : 5 Articles
In order to make the toy robot more entertaining, interesting, and intelligent, a voice recognition sensor and voice control system in the intelligent toy robot system are proposed. The system builds an overall system architecture including a client and a server. Through the camera calibration and data transmission module of the client, it collects images and calculates the internal and external parameters of the camera and transmits the image and external parameter data to the server. With the images and external parameter data transmitted from the terminal, a background image is constructed and the camera position and angle are updated in real time to complete the fusion of virtual and real scenes. Through the motion control part of the user interaction module, hearing-impaired children can control the movement and rotation of smart toys. The experimental results show that the system has high communication synchronization and stability and can realize high-precision control of smart toys, and the average frame rate can reach 30.97 f/s. The beneficial effect of the system is that it has various functions, has the effect of speech recognition, and is highly interesting....
(1) Background: To assess and compare speech intelligibility with conventional and universal musician-specific hearing protection devices (HPD); (2) Methods: The sample comprised 15 normal-hearing musicians of both sexes who had been professionals for more than 5 years. They underwent thorough audiological assessment and free-field audiometry to measure the attenuation levels of three HPD models (musician-specific, silicone, and foam devices). The sentence recognition thresholds in quiet (SRTQ) and noise (SRTN) were assessed with the Lists of Sentences in Portuguese. User satisfaction with musician HPD was assessed after 2 months; (3) Results: Conventional HPD had higher pure-tone mean attenuation levels than musician HPD. No statistically significant differences were found in SRTQ and SRTN between the three HPD types. However, the musician HPD had higher mean signal-to-noise ratios and percentages of correct words from sentences presented in noise than the other HPD. The answers also indicated a positive trend toward satisfaction with the musician-specific HPD; (4) Conclusions: Despite the lack of significant differences in speech intelligibility while wearing the three HPD models in either quiet or noise, the musician-specific HPD provided greater musical sound quality. This reinforces the possibility of an effective and adequate use of protection to preserve musicians’ hearing....
English listening is an effective way to improve students’ English expression ability and use oral communication. However, from the current situation of English teaching, the current English teaching methods are too single, and teachers do not focus on oral training in the classroom, resulting in low efficiency of classroom teaching. On the basis of following the principles of wholeness, interaction, balance, and sustainable development of educational ecology, by enhancing the synergy of ecological elements of English speaking classroom, promoting interactive dialogue among ecological subjects, and regulating classroom behaviors, it is conducive to giving full play to the advantageous role of information technology on English speaking teaching reform and promoting its sustainable development. This paper addresses the current situation of English listening teaching, especially the problem of reduced recognition rate of spoken language in noisy environment, and the principle of using dual-sensor speech recognition system proposed. We design the speech recognition method based on recurrent neural network by acquiring the weak vibration pressure speech signal of the jaw skin and the speech signal transmitted through the air during the vocalization process through the sensor. Deep machine learning algorithm is used for speech recognition in English teaching. A reasonable frame sampling frequency is set to obtain the English speech signal, then the feature parameters representing this speech signal are obtained by linear prediction coefficients, and the speech feature vector is generated, followed by the recurrent neural network algorithm to train the speech features. In the related experiments, by comparing with the commonly used speech recognition algorithms, it is proved that the proposed algorithm English teaching speech recognition has higher accuracy and faster convergence....
The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a multi-input deep neural network that simultaneously learns these two audio features. The proposed model comprises three parts, specifically, for learning Mel- Spec in image format, learning GeMAPS in vector format, and integrating them to predict the emotion. Additionally, a focal loss function is introduced to address the imbalanced data problem among the emotion classes. The results of the recognition experiments demonstrate weighted and unweighted accuracies of 0.6657 and 0.6149, respectively, which are higher than or comparable to those of the existing state-of-the-art methods. Overall, the proposed model significantly improves the recognition accuracy of the emotion “happiness”, which has been difficult to identify in previous studies owing to limited data. Therefore, the proposed model can effectively recognize emotions from speech and can be applied for practical purposes with future development....
Respiratory sounds have been used as a noninvasive and convenient method to estimate respiratory flow and tidal volume. However, current methods need calibration, making them difficult to use in a home environment. A respiratory sound analysis method is proposed to estimate tidal volume levels during sleep qualitatively. Respiratory sounds are filtered and segmented into one-minute clips, all clips are clustered into three categories: normal breathing/snoring/uncertain with agglomerative hierarchical clustering (AHC). Formant parameters are extracted to classify snoring clips into simple snoring and obstructive snoring with the K-means algorithm. For simple snoring clips, the tidal volume level is calculated based on snoring last time. For obstructive snoring clips, the tidal volume level is calculated by the maximum breathing pause interval. The performance of the proposed method is evaluated on an open dataset, PSG-Audio, in which full-night polysomnography (PSG) and tracheal sound were recorded simultaneously. The calculated tidal volume levels are compared with the corresponding lowest nocturnal oxygen saturation (LoO2) data. Experiments show that the proposed method calculates tidal volume levels with high accuracy and robustness....
Loading....